Recommendation Model with Approximate Item Matching

This notebook shows how to train a simple Neural Collaborative Filtering model for recommeding movies to users. We also show how learnt movie embeddings are stored in an appoximate similarity matching index, using Spotify's Annoy library, so that we can quickly find and recommend the most relevant movies to a given customer. We show how this index to search for similar movies.

In essense, this tutorial works as follows:

Download the movielens dataset.
Train a simple Neural Collaborative Model using TensorFlow custom estimator.
Extract the learnt movie embeddings.
Build an approximate similarity matching index for the movie embeddings.
Export the trained model, which receives a user Id, and output the user embedding.

The recommendation is served as follows:

Receives a user Id
Get the user embedding from the exported model
Find the similar movie embeddings to the user embedding in the index
Return the movie Ids of these embeddings to recommend

Setup



In [ ]:

    
!pip install annoy



In [1]:

    
import math
import os
import pandas as pd
import numpy as np
from datetime import datetime

import tensorflow as tf
from tensorflow import data

print "TensorFlow : {}".format(tf.__version__)

SEED = 19831060









    



TensorFlow : 1.12.0

1. Download Data



In [12]:

    
DATA_DIR='data'
! wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip -P data/
! unzip data/ml-latest-small.zip -d data/
TRAIN_DATA_FILE = os.path.join(DATA_DIR, 'ml-latest-small/ratings.csv')









    



--2019-02-17 16:29:41--  http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
Resolving files.grouplens.org... 128.101.34.235
Connecting to files.grouplens.org|128.101.34.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 978202 (955K) [application/zip]
Saving to: ‘data/ml-latest-small.zip’

ml-latest-small.zip 100%[===================>] 955.28K  1.19MB/s    in 0.8s    

2019-02-17 16:29:42 (1.19 MB/s) - ‘data/ml-latest-small.zip’ saved [978202/978202]

Archive:  data/ml-latest-small.zip
   creating: data/ml-latest-small/
  inflating: data/ml-latest-small/links.csv  
  inflating: data/ml-latest-small/tags.csv  
  inflating: data/ml-latest-small/ratings.csv  
  inflating: data/ml-latest-small/README.txt  
  inflating: data/ml-latest-small/movies.csv



In [13]:

    
ratings_data = pd.read_csv(TRAIN_DATA_FILE)
ratings_data.describe()









    Out[13]:







  
    
      
      userId
      movieId
      rating
      timestamp
    
  
  
    
      count
      100836.000000
      100836.000000
      100836.000000
      1.008360e+05
    
    
      mean
      326.127564
      19435.295718
      3.501557
      1.205946e+09
    
    
      std
      182.618491
      35530.987199
      1.042529
      2.162610e+08
    
    
      min
      1.000000
      1.000000
      0.500000
      8.281246e+08
    
    
      25%
      177.000000
      1199.000000
      3.000000
      1.019124e+09
    
    
      50%
      325.000000
      2991.000000
      3.500000
      1.186087e+09
    
    
      75%
      477.000000
      8122.000000
      4.000000
      1.435994e+09
    
    
      max
      610.000000
      193609.000000
      5.000000
      1.537799e+09



In [14]:

    
ratings_data.head()



In [16]:

    
movies_data = pd.read_csv(os.path.join(DATA_DIR, 'ml-latest-small/movies.csv'))
movies_data.head()









    Out[16]:







  
    
      
      movieId
      title
      genres
    
  
  
    
      0
      1
      Toy Story (1995)
      Adventure|Animation|Children|Comedy|Fantasy
    
    
      1
      2
      Jumanji (1995)
      Adventure|Children|Fantasy
    
    
      2
      3
      Grumpier Old Men (1995)
      Comedy|Romance
    
    
      3
      4
      Waiting to Exhale (1995)
      Comedy|Drama|Romance
    
    
      4
      5
      Father of the Bride Part II (1995)
      Comedy

2. Build the TensorFlow Model

2.1 Define Metadata



In [17]:

    
HEADER = ['userId', 'movieId', 'rating', 'timestamp']
HEADER_DEFAULTS = [0, 0, 0.0, 0]
TARGET_NAME = 'rating'
num_users = ratings_data.userId.max()
num_movies = movies_data.movieId.max()

2.2 Define Data Input Function



In [ ]:

    
def make_input_fn(file_pattern, batch_size, num_epochs, 
                  mode=tf.estimator.ModeKeys.EVAL):
    
    def _input_fn():
        dataset = tf.data.experimental.make_csv_dataset(
            file_pattern=file_pattern,
            batch_size=batch_size,
            column_names=HEADER,
            column_defaults=HEADER_DEFAULTS,
            label_name=TARGET_NAME,
            field_delim=',',
            use_quote_delim=True,
            header=True,
            num_epochs=num_epochs,
            shuffle= (mode==tf.estimator.ModeKeys.TRAIN)
        )
        return dataset
    
    return _input_fn

2.3 Create Feature Columns



In [19]:

    
def create_feature_columns(embedding_size):
    
    feature_columns = []
    
    feature_columns.append(
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_identity(
                'userId', num_buckets=num_users + 1), 
            embedding_size
        )
    )
    
    feature_columns.append(
        tf.feature_column.embedding_column(
            tf.feature_column.categorical_column_with_identity(
                'movieId', num_buckets=num_movies + 1),
            embedding_size
        )
    )
        
    return feature_columns

2.4 Define Model Function



In [20]:

    
def model_fn(features, labels, mode, params):
    
    feature_columns = create_feature_columns(params.embedding_size)
    user_layer = tf.feature_column.input_layer(
        features={'userId': features['userId']}, feature_columns=[feature_columns[0]])
    
    if mode != tf.estimator.ModeKeys.PREDICT:
        movie_layer = tf.feature_column.input_layer(
            features={'movieId': features['movieId']}, feature_columns=[feature_columns[1]])
        
        dot_product = tf.keras.layers.Dot(axes=1)([user_layer, movie_layer])
        logits = tf.clip_by_value(clip_value_min=0, clip_value_max=5, t=dot_product)

    predictions = None
    export_outputs = None
    loss = None
    train_op = None

    if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = {'user_embedding': user_layer}
        export_outputs = {'predictions': tf.estimator.export.PredictOutput(predictions)}
    else:
        loss = tf.losses.mean_squared_error(labels, tf.squeeze(logits))
        train_op=tf.train.FtrlOptimizer(params.learning_rate).minimize(
            loss=loss, global_step=tf.train.get_global_step())

    loss = tf.losses.mean_squared_error(labels, tf.squeeze(logits))
    return tf.estimator.EstimatorSpec(
        mode=mode,
        predictions=predictions,
        export_outputs=export_outputs,
        loss=loss,
        train_op=train_op
    )

2.5 Create Estimator



In [21]:

    
def create_estimator(params, run_config):
    
    estimator = tf.estimator.Estimator(
        model_fn,
        params=params,
        config=run_config
    )
    
    return estimator

2.6 Define Experiment



In [22]:

    
def train_and_evaluate_experiment(params, run_config):
    
    # TrainSpec ####################################
    train_input_fn = make_input_fn(
        TRAIN_DATA_FILE,
        batch_size=params.batch_size,
        num_epochs=None,
        mode=tf.estimator.ModeKeys.TRAIN
    )
    
    train_spec = tf.estimator.TrainSpec(
        input_fn = train_input_fn,
        max_steps=params.traning_steps
    )
    ###############################################    
    
    # EvalSpec ####################################
    eval_input_fn = make_input_fn(
        TRAIN_DATA_FILE,
        num_epochs=1,
        batch_size=params.batch_size,
    )

    eval_spec = tf.estimator.EvalSpec(
        name=datetime.utcnow().strftime("%H%M%S"),
        input_fn = eval_input_fn,
        steps=None,
        start_delay_secs=0,
        throttle_secs=params.eval_throttle_secs
    )
    ###############################################

    tf.logging.set_verbosity(tf.logging.INFO)
    
    if tf.gfile.Exists(run_config.model_dir):
        print("Removing previous artefacts...")
        tf.gfile.DeleteRecursively(run_config.model_dir)
            
    print ''
    estimator = create_estimator(params, run_config)
    print ''
    
    time_start = datetime.utcnow() 
    print("Experiment started at {}".format(time_start.strftime("%H:%M:%S")))
    print(".......................................") 

    tf.estimator.train_and_evaluate(
        estimator=estimator,
        train_spec=train_spec, 
        eval_spec=eval_spec
    )

    time_end = datetime.utcnow() 
    print(".......................................")
    print("Experiment finished at {}".format(time_end.strftime("%H:%M:%S")))
    print("")
    time_elapsed = time_end - time_start
    print("Experiment elapsed time: {} seconds".format(time_elapsed.total_seconds()))
    
    return estimator

2.7 Run Experiment with Parameters



In [23]:

    
MODELS_LOCATION = 'models/movieles'
MODEL_NAME = 'recommender_01'
model_dir = os.path.join(MODELS_LOCATION, MODEL_NAME)

params  = tf.contrib.training.HParams(
    batch_size=265,
    traning_steps=1000,
    learning_rate=0.1,
    embedding_size=16,
    eval_throttle_secs=0,
)

run_config = tf.estimator.RunConfig(
    tf_random_seed=SEED,
    save_checkpoints_steps=10000,
    keep_checkpoint_max=3,
    model_dir=model_dir,
)

estimator = train_and_evaluate_experiment(params, run_config)









    



INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 3, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x121eaf850>, '_model_dir': 'models/movieles/recommender_01', '_protocol': None, '_save_checkpoints_steps': 10000, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': 19831060, '_save_summary_steps': 100, '_device_fn': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_global_id_in_cluster': 0, '_master': ''}

Experiment started at 19:14:20
.......................................
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 10000 or save_checkpoints_secs None.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into models/movieles/recommender_01/model.ckpt.
INFO:tensorflow:loss = 13.449953, step = 1
INFO:tensorflow:global_step/sec: 264.144
INFO:tensorflow:loss = 13.744747, step = 101 (0.381 sec)
INFO:tensorflow:global_step/sec: 383.677
INFO:tensorflow:loss = 13.97245, step = 201 (0.260 sec)
INFO:tensorflow:global_step/sec: 369.578
INFO:tensorflow:loss = 13.871392, step = 301 (0.271 sec)
INFO:tensorflow:global_step/sec: 353.048
INFO:tensorflow:loss = 13.792967, step = 401 (0.284 sec)
INFO:tensorflow:global_step/sec: 362.21
INFO:tensorflow:loss = 14.036984, step = 501 (0.275 sec)
INFO:tensorflow:global_step/sec: 348.027
INFO:tensorflow:loss = 13.260927, step = 601 (0.287 sec)
INFO:tensorflow:global_step/sec: 302.508
INFO:tensorflow:loss = 12.141431, step = 701 (0.332 sec)
INFO:tensorflow:global_step/sec: 287.296
INFO:tensorflow:loss = 13.774624, step = 801 (0.347 sec)
INFO:tensorflow:global_step/sec: 285.236
INFO:tensorflow:loss = 13.345333, step = 901 (0.351 sec)
INFO:tensorflow:Saving checkpoints for 1000 into models/movieles/recommender_01/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-02-17-19:14:26
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from models/movieles/recommender_01/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-02-17-19:14:27
INFO:tensorflow:Saving dict for global step 1000: global_step = 1000, loss = 13.333607
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 1000: models/movieles/recommender_01/model.ckpt-1000
INFO:tensorflow:Loss for final step: 12.407291.
.......................................
Experiment finished at 19:14:27

Experiment elapsed time: 7.371008 seconds

3. Extract Movie Embeddings



In [ ]:

    
def find_embedding_tensor():
    with tf.Session() as sess:
        saver = tf.train.import_meta_graph(os.path.join(model_dir, 'model.ckpt-100000.meta'))
        saver.restore(sess, os.path.join(model_dir, 'model.ckpt-100000'))
        graph = tf.get_default_graph()
        trainable_tensors = map(str, graph.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES))
        for tensor in set(trainable_tensors):
            print tensor
            
find_embedding_tensor()



In [ ]:

    
def extract_embeddings():
    
    with tf.Session() as sess:
        saver = tf.train.import_meta_graph(os.path.join(model_dir, 'model.ckpt-100000.meta'))
        saver.restore(sess, os.path.join(model_dir, 'model.ckpt-100000'))
        graph = tf.get_default_graph()
        weights_tensor = graph.get_tensor_by_name('input_layer_1/movieId_embedding/embedding_weights:0')
        weights = np.array(sess.run(weights_tensor))
    
    embeddings = {}
    for i in range(weights.shape[0]):
        embeddings[i] = weights[i]
    
    return embeddings



In [ ]:

    
embeddings = extract_embeddings()

4. Build Annoy Index



In [ ]:

    
from annoy import AnnoyIndex

def build_embeddings_index(num_trees):
    total_items = 0
    annoy_index = AnnoyIndex(params.embedding_size, metric='angular')
    for item_id in embeddings.keys():
        annoy_index.add_item(item_id, embeddings[item_id])
        total_items += 1
    print "{} items where added to the index".format(total_items)
    annoy_index.build(n_trees=num_trees)
    print "Index is built"
    return annoy_index

index = build_embeddings_index(100)



In [112]:

    
frequent_movie_ids = list(ratings_data.movieId.value_counts().index[:15])



In [113]:

    
movies_data[movies_data['movieId'].isin(frequent_movie_ids)]









    Out[113]:







  
    
      
      movieId
      title
      genres
    
  
  
    
      0
      1
      Toy Story (1995)
      Adventure|Animation|Children|Comedy|Fantasy
    
    
      46
      50
      Usual Suspects, The (1995)
      Crime|Mystery|Thriller
    
    
      97
      110
      Braveheart (1995)
      Action|Drama|War
    
    
      224
      260
      Star Wars: Episode IV - A New Hope (1977)
      Action|Adventure|Sci-Fi
    
    
      257
      296
      Pulp Fiction (1994)
      Comedy|Crime|Drama|Thriller
    
    
      277
      318
      Shawshank Redemption, The (1994)
      Crime|Drama
    
    
      314
      356
      Forrest Gump (1994)
      Comedy|Drama|Romance|War
    
    
      418
      480
      Jurassic Park (1993)
      Action|Adventure|Sci-Fi|Thriller
    
    
      461
      527
      Schindler's List (1993)
      Drama|War
    
    
      507
      589
      Terminator 2: Judgment Day (1991)
      Action|Sci-Fi
    
    
      510
      593
      Silence of the Lambs, The (1991)
      Crime|Horror|Thriller
    
    
      898
      1196
      Star Wars: Episode V - The Empire Strikes Back...
      Action|Adventure|Sci-Fi
    
    
      1939
      2571
      Matrix, The (1999)
      Action|Sci-Fi|Thriller
    
    
      2145
      2858
      American Beauty (1999)
      Drama|Romance
    
    
      2226
      2959
      Fight Club (1999)
      Action|Crime|Drama|Thriller



In [114]:

    
def get_similar_movies(movie_id, num_matches=5):
    similar_movie_ids = index.get_nns_by_item(
        movie_id, num_matches, search_k=-1, include_distances=False)
    similar_movies = movies_data[movies_data['movieId'].isin(similar_movie_ids)].title
    return similar_movies



In [115]:

    
for movie_id in frequent_movie_ids:
    movie_title = movies_data[movies_data['movieId'] == movie_id].title.values[0]
    print "Movie: {}".format(movie_title)
    similar_movies = get_similar_movies(movie_id)
    print "Similar Movies:"
    print similar_movies
    print "--------------------------------------"









    



Movie: Forrest Gump (1994)
Similar Movies:
55                  Mr. Holland's Opus (1995)
314                       Forrest Gump (1994)
1956    Open Your Eyes (Abre los ojos) (1997)
2372                   Green Mile, The (1999)
4867                    50 First Dates (2004)
Name: title, dtype: object
--------------------------------------
Movie: Shawshank Redemption, The (1994)
Similar Movies:
277          Shawshank Redemption, The (1994)
955                          Duck Soup (1933)
1956    Open Your Eyes (Abre los ojos) (1997)
2462              Boondock Saints, The (2000)
7466                King's Speech, The (2010)
Name: title, dtype: object
--------------------------------------
Movie: Pulp Fiction (1994)
Similar Movies:
257                                   Pulp Fiction (1994)
2226                                    Fight Club (1999)
2250                      Who Framed Roger Rabbit? (1988)
6310    Borat: Cultural Learnings of America for Make ...
Name: title, dtype: object
--------------------------------------
Movie: Silence of the Lambs, The (1991)
Similar Movies:
510     Silence of the Lambs, The (1991)
941                         Glory (1989)
1032                    Cape Fear (1962)
2078             Sixth Sense, The (1999)
5374             Incredibles, The (2004)
Name: title, dtype: object
--------------------------------------
Movie: Matrix, The (1999)
Similar Movies:
418                                  Jurassic Park (1993)
509                                         Batman (1989)
793                                       Die Hard (1988)
911     Star Wars: Episode VI - Return of the Jedi (1983)
1939                                   Matrix, The (1999)
Name: title, dtype: object
--------------------------------------
Movie: Star Wars: Episode IV - A New Hope (1977)
Similar Movies:
224             Star Wars: Episode IV - A New Hope (1977)
898     Star Wars: Episode V - The Empire Strikes Back...
911     Star Wars: Episode VI - Return of the Jedi (1983)
969                             Back to the Future (1985)
2097                                     Airplane! (1980)
Name: title, dtype: object
--------------------------------------
Movie: Jurassic Park (1993)
Similar Movies:
63          Fair Game (1995)
253          Outbreak (1995)
418     Jurassic Park (1993)
793          Die Hard (1988)
2608             Hook (1991)
Name: title, dtype: object
--------------------------------------
Movie: Braveheart (1995)
Similar Movies:
31      Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
97                              Braveheart (1995)
337                              True Lies (1994)
1267                      Truman Show, The (1998)
1803      First Blood (Rambo: First Blood) (1982)
Name: title, dtype: object
--------------------------------------
Movie: Terminator 2: Judgment Day (1991)
Similar Movies:
474                         Blade Runner (1982)
507           Terminator 2: Judgment Day (1991)
939                      Terminator, The (1984)
1469                         Poltergeist (1982)
1803    First Blood (Rambo: First Blood) (1982)
Name: title, dtype: object
--------------------------------------
Movie: Schindler's List (1993)
Similar Movies:
13                                 Nixon (1995)
461                     Schindler's List (1993)
561     Some Folks Call It a Sling Blade (1993)
922              Godfather: Part II, The (1974)
2110                  Christmas Story, A (1983)
Name: title, dtype: object
--------------------------------------
Movie: Fight Club (1999)
Similar Movies:
596     Ghost in the Shell (Kôkaku kidôtai) (1995)
1706                                   Antz (1998)
1734                     American History X (1998)
2226                             Fight Club (1999)
6676                              In Bruges (2008)
Name: title, dtype: object
--------------------------------------
Movie: Toy Story (1995)
Similar Movies:
0                             Toy Story (1995)
506                             Aladdin (1992)
2436    Hand That Rocks the Cradle, The (1992)
3568                     Monsters, Inc. (2001)
7355                        Toy Story 3 (2010)
Name: title, dtype: object
--------------------------------------
Movie: Star Wars: Episode V - The Empire Strikes Back (1980)
Similar Movies:
224             Star Wars: Episode IV - A New Hope (1977)
826                              Dial M for Murder (1954)
898     Star Wars: Episode V - The Empire Strikes Back...
911     Star Wars: Episode VI - Return of the Jedi (1983)
4711                                  Quick Change (1990)
Name: title, dtype: object
--------------------------------------
Movie: American Beauty (1999)
Similar Movies:
962         Deer Hunter, The (1978)
1290    Sweet Hereafter, The (1997)
2145         American Beauty (1999)
2226              Fight Club (1999)
3141                 Memento (2000)
Name: title, dtype: object
--------------------------------------
Movie: Usual Suspects, The (1995)
Similar Movies:
46               Usual Suspects, The (1995)
277        Shawshank Redemption, The (1994)
2462            Boondock Saints, The (2000)
3234    A.I. Artificial Intelligence (2001)
5917                   Batman Begins (2005)
Name: title, dtype: object
--------------------------------------

5. Export the Model

This needed to receive a userId and produce the embedding for the user.



In [146]:

    
def make_serving_input_receiver_fn():
    return tf.estimator.export.build_raw_serving_input_receiver_fn(
        {'userId': tf.placeholder(shape=[None], dtype=tf.int32)}
    )

export_dir = os.path.join(model_dir, 'export')

if tf.gfile.Exists(export_dir):
    tf.gfile.DeleteRecursively(export_dir)
        
estimator.export_savedmodel(
    export_dir_base=export_dir,
    serving_input_receiver_fn=make_serving_input_receiver_fn()
)









    



INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default', 'predictions']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Restoring parameters from models/movieles/recommender_01/model.ckpt-1000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: models/movieles/recommender_01/export/temp-1550083878/saved_model.pb






    Out[146]:





'models/movieles/recommender_01/export/1550083878'



In [147]:

    
import os

export_dir = os.path.join(model_dir, "export")
saved_model_dir = os.path.join(
    export_dir, [f for f in os.listdir(export_dir) if f.isdigit()][0])

print(saved_model_dir)

predictor_fn = tf.contrib.predictor.from_saved_model(
    export_dir = saved_model_dir,
)

output = predictor_fn({'userId': [1]})
print(output)









    



models/movieles/recommender_01/export/1550083878
INFO:tensorflow:Restoring parameters from models/movieles/recommender_01/export/1550083878/variables/variables
{u'user_embedding': array([[-0.04079459, -0.06252338,  0.01964831, -0.03159623,  0.01765972,
         0.00015648,  0.0686218 ,  0.01872032,  0.04238764,  0.03700782,
         0.00166043,  0.00917281,  0.01879879, -0.01652114, -0.02870869,
         0.00668285]], dtype=float32)}

Serve Movie Recommendations to a User



In [190]:

    
def recommend_new_movies(userId, num_recommendations=5):
    watched_movie_ids = list(ratings_data[ratings_data['userId']==userId]['movieId'])
    
    user_emebding = predictor_fn({'userId': [userId]})['user_embedding'][0]
    similar_movie_ids = index.get_nns_by_vector(
        user_emebding, num_recommendations + len(watched_movie_ids), search_k=-1, include_distances=False)
    
    recommended_movie_ids = set(similar_movie_ids) - set(watched_movie_ids)
    similar_movies = movies_data[movies_data['movieId'].isin(recommended_movie_ids)].title
    return similar_movies



In [191]:

    
frequent_user_ids = list((ratings_data.userId.value_counts().index[-350:]))[:5] 
print recommend_movies(418)









    



3857            Dangerous Lives of Altar Boys, The (2002)
8140    Wolf Children (Okami kodomo no ame to yuki) (2...
8429                                          Chef (2014)
9610                                     Cage Dive (2017)
9729                                         Bunny (1998)
Name: title, dtype: object

License

Author: Khalid Salama

Disclaimer: This is not an official Google product. This sample code provided for an educational purpose.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

	userId	movieId	rating	timestamp
count	100836.000000	100836.000000	100836.000000	1.008360e+05
mean	326.127564	19435.295718	3.501557	1.205946e+09
std	182.618491	35530.987199	1.042529	2.162610e+08
min	1.000000	1.000000	0.500000	8.281246e+08
25%	177.000000	1199.000000	3.000000	1.019124e+09
50%	325.000000	2991.000000	3.500000	1.186087e+09
75%	477.000000	8122.000000	4.000000	1.435994e+09
max	610.000000	193609.000000	5.000000	1.537799e+09

	userId	movieId	rating	timestamp
0	1	1	4.0	964982703
1	1	3	4.0	964981247
2	1	6	4.0	964982224
3	1	47	5.0	964983815
4	1	50	5.0	964982931

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy
2	3	Grumpier Old Men (1995)	Comedy\|Romance
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
4	5	Father of the Bride Part II (1995)	Comedy

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
46	50	Usual Suspects, The (1995)	Crime\|Mystery\|Thriller
97	110	Braveheart (1995)	Action\|Drama\|War
224	260	Star Wars: Episode IV - A New Hope (1977)	Action\|Adventure\|Sci-Fi
257	296	Pulp Fiction (1994)	Comedy\|Crime\|Drama\|Thriller
277	318	Shawshank Redemption, The (1994)	Crime\|Drama
314	356	Forrest Gump (1994)	Comedy\|Drama\|Romance\|War
418	480	Jurassic Park (1993)	Action\|Adventure\|Sci-Fi\|Thriller
461	527	Schindler's List (1993)	Drama\|War
507	589	Terminator 2: Judgment Day (1991)	Action\|Sci-Fi
510	593	Silence of the Lambs, The (1991)	Crime\|Horror\|Thriller
898	1196	Star Wars: Episode V - The Empire Strikes Back...	Action\|Adventure\|Sci-Fi
1939	2571	Matrix, The (1999)	Action\|Sci-Fi\|Thriller
2145	2858	American Beauty (1999)	Drama\|Romance
2226	2959	Fight Club (1999)	Action\|Crime\|Drama\|Thriller